Computationally Efficient Speaker Identification for Large Population Tasks using MLLR and Sufficient Statistics
نویسندگان
چکیده
In conventional Speaker-Identification using GMM-UBM framework, the likelihood of the given test utterance is computed with respect to all speaker-models before identifying the speaker, based on the maximum likelihood criterion. The calculation of likelihood score of the test utterance is computationally intensive, especially when there are tens of thousands of speakers in database. In this paper, we propose a computationally efficient (Fast) method to calculate the likelihood of the test utterance using speaker-specific Maximum Likelihood Linear Regression (MLLR) matrices (which are precomputed) and sufficient statistics estimated from the test utterance only once. We show that while this method is an order of magnitude faster, there is some degradation in performance. Therefore, we propose a cascaded system with the Fast MLLR system identifying the top-N most probable speakers, followed by a conventional GMM-UBM to identify the most probable speaker from the topN speakers. Experiments performed on the NIST 2004 database indicate that the cascaded system provides a speed up of 3.16 and 6.08 times for 1-side test (core condition) and 10 sec. test condition respectively, with a marginal degradation in accuracy over the conventional GMM-UBM system.
منابع مشابه
Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework
Anchor modeling technique has been shown to be useful in reducing computational complexity for speaker identification and indexing of large audio database. In this technique, speakers are projected onto a talker space spanned by a set of predefined anchor models which are usually represented by Gaussian Mixture Models (GMMs). The characterization of each speaker involves calculation of likeliho...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملEigen-Voice Based Anchor Modeling System for Speaker Identification Using MLLR Super-Vector
In this paper, we propose an anchor modeling scheme where instead of conventional “anchor” speakers, we use eigenvectors that span the Eigen-voice space. The computational advantage of conventional Anchor-modeling based speaker identification system comes from representing all speakers in a space spanned by a small number of anchor speakers instead of having separate speaker models. The convent...
متن کاملUnsupervised speaker adaptation based on sufficient HMM statistics of selected speakers
This paper describes an efficient method for unsupervised speaker adaptation. This method is based on (1) selecting a subset of speakers who are acoustically close to a test speaker, and (2) calculating adapted model parameters according to the previously stored sufficient HMM statistics of the selected speakers’ data. In this method, only a few unsupervised test speaker’s data are required for...
متن کامل